Evaluation Techniques for Automatic SemanticExtraction : Comparing Syntactic and Window

نویسنده

  • Gregory Grefenstette
چکیده

As large on-line corpora become more prevalent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any speciic application, comparing the results of these attempts is diicult. Here we propose an evaluation method using gold standards , i.e., pre-existing hand-compiled resources, as a means of comparing extraction techniques. Using this evaluation method, we compare two semantic extraction techniques which produce similar word lists, one using syntactic context of words , and the other using windows of heuristically tagged words. The two techniques are very similar except that in one case selective natural language processing, a partial syntactic analysis, is performed. On a 4 megabyte corpus, syntactic contexts produce signiicantly better results against the gold standards for the most characteristic words in the corpus, while windows produce better results for rare words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation Techniques For Automatic Semantic Extraction: Comparing Syntactic And Window Based Approaches

As large on-line corpora become more prevalent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any specific application, comparing the results of these attempts is difficult. Here we propose an evaluation method using gold standards, i.e., pre-existing hand-compiled resources, as a means of...

متن کامل

Automatic Evaluation of Summary Using Textual Entailment

This paper describes about an automatic technique of evaluating summary. The standard and popular summary evaluation techniques or tools are not fully automatic; they all need some manual process. Using textual entailment (TE) the generated summary can be evaluated automatically without any manual evaluation/process. The TE system is the composition of lexical entailment module, lexical distanc...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Entailment-based Fully Automatic Technique for Evaluation of Summaries

We propose a fully automatic technique for evaluating text summaries without the need to prepare the gold standard summaries manually. A standard and popular summary evaluation techniques or tools are not fully automatic; they all need some manual process or manual reference summary. Using recognizing textual entailment (TE), automatically generated summaries can be evaluated completely automat...

متن کامل

Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output

Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate (WER), Position Independent Word Error Rate (PER) and BLEU and NIST scores have become widely used tools for comparing different systems as well as for evaluating improvements within one system. However,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993